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ABSTRACT 

Whenever observations are compared to theories, an estimate of the uncertainties as- 
sociated with the observations is vital if the comparison is to be meaningful. However, 
many or even most determinations of temperatures, densities and abundances in pho- 
toionized nebulae do not quote the associated uncertainty. Those that do typically 
propagate the uncertainties using analytical techniques which rely on assumptions 
that generally do not hold. 

Motivated by this issue, we have developed neat (Nebular Empirical Analysis 
Tool), a new code for calculating chemical abundances in photoionized nebulae. The 
code carries out a standard analysis of lists of emission lines using long-established 
techniques to estimate the amount of interstellar extinction, calculate representative 
temperatures and densities, compute ionic abundances from both collisionally excited 
lines and recombination lines, and finally to estimate total elemental abundances using 
an ionization correction scheme, neat uses a Monte Carlo technique to robustly prop- 
agate uncertainties from line fiux measurements through to the derived abundances. 

We show that for typical observational data, this approach is superior to analytic 
estimates of uncertainties, neat also accounts for the effect of upward biasing on 
measurements of lines with low signal to noise, allowing us to accurately quantify the 
effect of this bias on abundance determinations. We find not only that the effect can 
result in significant over-estimates of heavy element abundances derived from weak 
lines, but that taking it into account reduces the uncertainty of these abundance 
determinations. Finally, we investigate the effect of possible uncertainties in R, the 
ratio of selective to total extinction, on abundance determinations. We find that the 
uncertainty due to this parameter is negligible compared to the statistical uncertainties 
due to typical line fiux measurement uncertainties. 

Key words: ISM: abundances - atomic processes - methods: statistical 



1 INTRODUCTION 



Abundance determinations from photoionized nebulae play 
crucial roles in a variety of astrophysical contexts. Nebu- 
lae around evolved stars, e.g. Planetary Nebulae (PNe) or 
Wolf-Rayet (WR) ejecta nebulae provide constraints on the- 
ories of stellar nucleosynthesis and evolution ( e.g. [Karakas 
erar]|2009l [Magrini et al. 2011] |Maeder|[T992l [Stock et al.' 



Studies of photoionized nebulae have a very long his- 



2011| . Such data is invaluable as constraints on stellar yields; 
the inputs of galactic chemical evolution models. Meanwhile, 
Galactic and extragalactic H ii regions provide insights into 
the current composition of the ISM and therefore are vi- 
tal constraints for the output of galactic chemical evolution 
models (e.g. |Pagel||1997| |Matteucci||2003 ). 



tory, stretching back to the dawn of astrophysics (e.g. Hug- 
gins fc Miller|1864 ), but some major questions remain unan- 
swered. One example is the sometimes sizeable discrepancy 
between abundances derived from collisionally excited lines 
(CELs) and those derived from recombination lines (RLs) 
(Wesson et~aLl|20Q5| |Liu et al.|[2006l [Garcia-Rojas k Este^ 



ban||2007| [Tsamis et al.||2Q08 ) 



To meaningfully assess whether results determined ob- 
servationally are consistent or discrepant with model pre- 
dictions, the uncertainties on both the observations and the 
model must be estimated. In neither case is such an estimate 
straightforward. The uncertainty on the observations is a 
combination of statistical uncertainties, relating ultimately 
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to measurement uncertainties, and systematic uncertainties, 
which may arise from many sources including the instrumen- 
tation used to obtain the hne flux measurements, the cho- 
sen parametrisation of the interstellar reddening law, and 
the methodological choices made in the analysis. Sources of 
systematic uncertainty may be unknown and even unquan- 
tifiable. Finally, the uncertainty attributed to models is per- 
haps the most difficult to quantify, being related ultimately 
to the confidence the modeller has that the model encapsu- 
lates the underlying physics without excessive simplification 
or unwarranted assumptions. 

We concern ourselves in this paper with an improved 
understanding of statistical uncertainties and their effect on 
empirically determined nebular abundances. We present a 
new code for calculating chemical abundances in photoion- 
ized nebulae, which can also robustly estimate statistical 
uncertainties using a Monte Carlo approach. This method is 
inherently superior to analytic methods of uncertainty prop- 
agation, which rely on assumptions which are almost always 
violated by the measurement uncertainties inherent to ac- 
tual astronomical observations. 

We use our code to reanalyse several published line lists 
for which uncertainties are available, and we show firstly 
that analytic methods do not give a good representation 
of the true uncertainties on derived quantities. Generally, 
uncertainties on derived abundances are better characterised 
by log- normal distributions than by normal distributions. In 
some cases, bimodal probability distributions emerge. 

Secondly, as the Monte Carlo approach can trivially 
handle any quantifiable distribution of line flux uncertain- 
ties, we have designed the code to account for the well known 
upward bias in measurements of weak lines, and associated 
log-normal distribution of measurement uncertainties ( |Rola| 
I^P elat 1994). The bias is well known but its actual effect 
on abundance determinations has not previously been rigor- 
ously quantified. We show, as expected, that abundances de- 
termined from weak lines are systematically overestimated. 
Moreover, we show that in addition to removing this bias, ac- 
counting for the effect leads to a significant reduction in the 
uncertainty of abundance determinations from weak lines. 

Finally, we investigate the effect of an assumed statisti- 
cal uncertainty in R, the ratio of selective to total extinction. 
We show that the extra uncertainty introduced into abun- 
dance determinations by taking this parameter to be 3.1 it 
0.15 instead of a fixed value of 3.1 is negligible compared to 
the uncertainties arising from line flux measurements, even 
when the extinction is quite large and the line fluxes are well 
measured. 

The words "uncertainty" and "error" are often used 
synonymously. In this article we maintain a distinction in 
meaning between the terms: "uncertainty" refers to the lim- 
iting accuracy of the knowledge of a quantity, while "error" 
refers to an actual mistake. 



2 STATISTICAL UNCERTAINTIES 

Uncertainties in observed quantities can be propagated into 
the uncertainty on derived parameters in a number of differ- 
ent ways, the two most common of which are the traditional 
analytical technique based on systems of partial derivatives 
and simplifying assumptions that allow one to apply Taylor 



expansions, and the 'Monte Carlo' method which is a brute- 
force iterative method that exploits the wealth of computa- 
tional power now readily available by building on knowledge 
of the uncertainty in the original observations. 

The analytical approach is as follows: if one has mea- 
sured a quantity x with some uncertainty ax, and wishes 
to calculate the uncertainty in a quantity F given that 
F = /(x), then the uncertainty on F can be computed via 
the relation 



aF 
ax 



9/ 
dx 



(1) 



However, in general one may not be able to compute 
this partial differential exactly, and in these cases, provided 
that 



^1 



« fixi 



(2) 



it is possible to use a first order Taylor expansion to 
approximate this derivative. If one had a third value, H — 
h{f, g) where / and g are both functions of other variables 
and are statistically independent of one another, it would be 
necessary to find the total derivative dH such that 



dh' 



dh 
df 



df + 



dg 



and it thus follows that 
dh 



2 I dh 



dg 



erg 



(3) 



(4) 



This expression can be generalised to any number of 
variables, and gives rise to the usual quadrature formulae for 
many simple functions of x. We highlight three key aspects 
of this approach: 

(i) Given the number of formulae through which the orig- 
inal line flux data must be put before an abundance can be 
determined, and the wide variety of functions applied, the 
equations necessary to propagate uncertainties in this way 
can become extremely complex; 

(ii) the approach implicitly assumes that all the input and 
output uncertainties at each step are normally distributed; 

(iii) the Taylor expansion requires that the uncertainties 
be small relative to the quantities. 

The first point is a matter of convenience but none 
the less one which discourages many authors from even at- 
tempting to propagate uncertainties all the way into the 
final quantities. The second and third points are clearly vi- 
olated in many or most real astrophysical observations, by 
virtue of which the uncertainties estimated using analytical 
techniques are liable not to reflect the true uncertainties. 

As an example of the cumbersome nature of the analytic 
approach, we consider the estimation of the uncertainty on 



c(H/3). As described in Section 3.2 our code calculates c(H/3) 
from the flux- weighted average of values derived from Ha, 
H7 and H^. Using the analytic approach described above, 
one finds the uncertainty on the unweighted c(H/3) as fol- 
lows. Firstly, 



c{Hp) 



I{Ha)c{HI3)a. + I{H-f)c{HI3)^ + I{H6)c{H^)s 



I{Ha) + I{H-f) + I{H5) 



(5) 



where 
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c{Hl3)o. = 



^^9 {t^) 



f{Ha) 



(6) 



and similarly for c(H/3)7 and c(H/3)(5. The analytically 
estimated uncertainty on c(H/3) derived using this method 
is 



3 NEAT: NEBULAR EMPIRICAL 
ABUNDANCE TOOL 

We have developed a new code, neat (Nebular Empirical 
Analysis Tool), to quickly carry out a thorough analysis of 
emission line spectra, and to propagate statistical uncer- 
tainties using the Monte Carlo technique described above. 
In this section we describe the code and how it works. 



C(HI3 \ (c(H(3)^+c(H(3)^+c(H(3)s)^ 



+ 



{I{Hcy)+I{H-f)+I{H5))'^ 



(7) 



where 



Ac(if/3). =c(if/3). X U^^) 



+ 



0.434 X (A /^5"\ ) 



l0(-Ff«) 



iQJHc^)' 



(8) 
(9) 



and similarly for Ac{H/3)^ and Ac{H/3)s. The mere 
derivation of such a formula is tiresome and complicated, 
and it would be exceedingly easy for errors to be introduced. 
Even this is somewhat simplified: this equation assumes zero 
uncertainty in the values of f(Ha, 7, 6) and lo{Ha, 7, S). The 
uncertainty estimated for c(H/3) then has to be propagated 
into the complex functions of line ratios which are used to 
give temperatures and densities, and then into the calcula- 
tions of ionic abundances. 

The Monte Carlo method, on the other hand, avoids all 
of the pitfalls of the analytical method. It exploits the fact 
that if an observation of a quantity is drawn from a distri- 
bution X, with mean x and variance ax , and if one knows 
(or can make sensible assumptions about) the shape of this 
distribution, then a random-number generator can be used 
to repeatedly draw values from X, creating a random sample 
from it. Using the above example of F = f{x), if one wanted 
to examine the uncertainty in F, the operation f(x) could be 
performed upon every value in the aforementioned sample to 
produce a sample of the distribution from which F is drawn, 
which can then be parametrised to estimate the type, mean 
and standard deviation of this distribution. This process can 
be repeated ad infinitum for any number or combination of 
functions of F, x, or any other variable derived in the same 
way to propagate the uncertainties on the quantities, irre- 
spective of the size of the uncertainties and any statistical 
interdependence of the variables. This approach thus has the 
following advantages over the analytical approach: 

(i) It is inherently very simple; 

(ii) Any distribution of uncertainties can be propagated 
at any stage; 

(iii) it does not require relative uncertainties to be small 

The Monte Carlo approach is thus inherently robust 
when applied to real astrophysical data, in a way that the 
analytical approach is not. The only limitation is then the 
time taken to run the calculation enough times to sample 
well the statistics of the output distributions. 



3.1 Input 

NEAT incorporates elements of several previous codes, most 
importantly the EQUIB code, also developed at UCL, 
for solving the equations of statistical equilibrium in 
multi- level atoms. All the source code, documentation, 
atomic data and example line lists are freely available at 
http : //www . sc . eso . org/~rwesson/codes/neat/. The code 
is designed to be as simple as possible from the user's per- 
spective, and our aim is that the user should be able to 
simply pass the code a line list and get out abundances de- 
termined by an objective methodology in which the user 
need not make any choices. The code has no external de- 
pendencies and should compile without problems on any 
Unix-based system. We have also verified that it compiles 
and runs on Windows systems, should the user be restricted 
to such an OS. 



3.1.1 Atomic data 

Hydrogen recombination data from [Storey fc Hummer] 



Smits 



(1996) 



(1995| and helium recombination data from 

are used throughout. Helium abundances are corrected for 

collisional effects using the formulae provided by Kingdon| 



& Ferland (1995) 



All atomic data for heavier elements is stored externally 
in plain text files, so that the user can easily change the 
atomic data being used without needing to edit or recom- 
pile the code. We provide three sets of atomic data for col- 
lisionally excited lines with the code - a compilation from a 
variety of sources compiled on an ad hoc basis, CHIANTI 5.2 
( [Landiet al.pOOGi , and CHIANTI 6.0 ( |Dere et al.|2009t . In 
all the analyses presented in this paper, we used atomic data 
from CHIANTI 5.2, with the exception of data for 0+ for 
which a documented error exists in the CHIANTI 5.2 data 
( Kisielius et al.|2009 ), and S^^, which we believe is affected 
by a similar error. For O^ we used transition probabilities 
from IZeippen (1982) and collision strengths from Pradhan 



( 1976|); for S^^ w e used transition probabilities from 
doza & Zeippen ( 1982| and collision strengths from 



Men- 



Men- 



doza (1983). For recombination lines, we use data from the 



sources given in Table [T] 

3.1.2 Line list format 

NEAT requires as input a plain text list of rest wavelengths, 
line fluxes, and uncertainties, neat currently recognises 738 
lines, 81 of which are collisionally excited lines and 657 of 
which are recombination lines. The code assigns line IDs 
based on an exact match to its reference list of rest wave- 
lengths. Different sources of atomic data often quote slightly 
different rest wavelengths for a given line; we include with 
NEAT a separate code to read in line lists and reassign rest 
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Ion 



Recombination data source 



C2 + 
C3 + 

N2 + 
N3 + 

02+ (3s-3p) 

02+ (3p-3d, 3d-4f) 

Ne2+ (3s-3p) 
Ne2+ (3d-4f) 



Davey et al. 



( 20p0|> 

aftlfl99l | 
Victor ( 1 990| 

al.|( |199l| 



Pequ ignot"et al.l 11991 
Escalante & Victor ( 1990| 
'Pequignot et al.il 
Storey (1994) 
Liu et a lj(|1995|> 
KisieliusetaT ]1998| > 
Storey (unpublished) 



Table 1. Atomic data used in NEAT for recombination lines 

wavelengths to the values which NEAT recognises. Lines 
which are not recognised are ignored in the analysis. 

Some cases of line blends can be accounted for; for ex- 
ample, the [O ii] lines at 3726 and 3729A may be blended 
in low resolution spectra. In this case, a rest wavelength of 
3727.00 can be given, and the code will properly treat the 
combined flux. Blends of recombination lines cannot cur- 
rently be accounted for but we plan to develop means of 
doing so in future versions of neat. 

For a very small number of recombination lines, close 
coincidences in rest wavelengths may lead to misidentifica- 
tions. For example, three weak O ii 3d-4f transitions from 
multiplets V63c, V78b and V63c have rest wavelengths of 
4315.39, 4315.39 and 4315.40 respectively. These three lines 
will almost certainy be detected as a blend if detected at 
all. At the moment, neat would attribute all flux at a wave- 
length of 4315.39 to each of the three lines. The effect on final 
abundances should this occur is likely to be very small, as 
only very weak RLs are affected. Again, we plan to improve 
the sophistication of neat's approach in future versions. 

The user can select the number of iterations of the code 
to run. If the number of iterations is one, the code performs 
a standard empirical analysis on the line list, as described 
below, and does not calculate any uncertainties. If the num- 
ber of iterations is more than one, then the code randomises 
the line list before each iteration. The standard analysis is 
then carried out on the randomised line list. 



3.1.3 Line flux randomisation 

The code randomises the line fluxes by assuming that they 
come from one of three distributions, depending on the mea- 
sured signal to noise. The three cases are: 

SNR>6.0: the line flux is assumed to be drawn from 
a Gaussian distribution. In this case, if the given line flux 
measurement is F and the given measurement uncertainty is 
a, then in each iteration of the code the line flux is calculated 
using 



F^ = F+{Rxa) 



(10) 



where R is a random number drawn from a Gaussian 
distribution with a mean of zero and a standard deviation 
of unity. 

1.0<SNR<6.0: 



Rola &; Pelat (1994 hereafter RP94) 



observed that measurements of weak lines (SNR<6) are 
strongly biased upwards, and the uncertainty on such lines 
cannot be well represented by a normal distribution. We re- 
fer to this effect henceforth as the RP effect. Taking the 
RP94 effect into account is straightforward with a Monte 



Carlo approach, requiring only that a log-normal distribu- 
tion with appropriate mean and standard deviation is used 
for weak lines, instead of the normal distribution used for 
strong lines. 

For lines with F/a <6, we randomise the line fluxes 
using the following procedure: we first determine the log- 
normal distribution appropriate to the measured signal-to- 
noise ratio. RP94 gave parameters for the log-normal distri- 
butions of Fobs/Ftrne, as a function of the observed SNR, 
and we fitted the following equations to the parameters in 
their Table 6: 



0.0765957 1.86037 

M= o — + 



snr'' 
-1.11329 



0.309695 



snr 
1.8542 0.288222 



+ 0.18018 



(11) 

(12) 



snr-^ snr'' snr 

From the observed SNR we thus obtain the appropriate 
/i and a. Then, the line flux is found using the following 
equation: 

F 

^^ ^ g(HXCT + /x) (^^) 

where R is a random number from a Gaussian distribu- 
tion as before. The result of this procedure is that weaker 
lines are drawn from log-normal distributions that peak be- 
low the observed value, with the effect increasing as SNR 
decreases. 

SNR<1.0: If the quoted uncertainty is larger than 
the actual line flux, the code assumes that the quote flux 
represents a 5cr upper limit, and thus draws the randomized 
line flux from a folded Gaussian distribution, with /i=0 and 
cr=0.2xF. The line flux is given by 

Fi = abs{R) X 0.2F (14) 

Figure [l] shows the various possibilities that arise de- 
pending on the measured SNR. The distributions plotted 
are from 10^ runs of NEAT in which 7 artificial line fluxes 
were randomized. Each of the lines had a measured flux of 
10.0, and the quoted uncertainties represented SNRs of 1.0, 
2.0, 3.0, 4.0, 5.0, 6.0 and 8.0. The figures shows how the 
normal distribution appropriate at high SNR is replaced by 
a log-normal distribution increasingly skewed towards lower 
values as SNR decreases. 



3.1.4 Random number algorithm 

This process relies on the FORTRAN random_number 
command, seeded using the system clock and the time of the 
code's execution, to generate pseudo-random numbers. The 
uniformly distributed numbers thus generated are then con- 
verted into a Gaussian distribution using an algorithm based 
on the ratio of uniforms method of Kinderman & MonahanJ 
( |1977{ ), available from netlib.org. We tested the perfor- 
mance of this method by running the code 1 000 000 times, 
and plotting the distribution of fluxes obtained for an ar- 
bitrarily selected line. We then fitted a Gaussian function 
to this distribution. We found that the recovered mean was 
within 0.008% of the specified value, while the recovered 
standard deviation was within 0.13% of the specified value. 
Figure [2] shows the histogram of generated values with the 
required Gaussian distribution overplotted. We thus con- 
sider that the random number generator in the code provides 
a reliably random Gaussian distribution. 
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SNR=1.0 
SNR=2.0 
SNR=3.0 
SNR=4.0 
SNR=5.0 
SNR=6.0 
SNR=8.0 




Figure 1. The behaviour of neat's hne flux randomization as a 
function of SNR, for a flxed measured flux of 10.0. At high SNR 
(i.e. >6), normal distributions apply. At lower SNR, the effect 
described by |Rola fc Pelat| ( |1994| ) results in log-normal distribu- 
tions which are skewed towards lower values than the measured 
intensity. If SNR^l then the code assumes that the quoted flux 
is a 5cr upper limit 




Flux (arbitrary units) 



Figure 2. A distribution of values produced by 1 000 000 runs of 
the random number generator, with the target Gaussian distri- 
bution overplotted. 



3.1.5 Sampling uncertainty 

The Monte Carlo approach relies on carrying out the analy- 
sis enough times to adequately sample the probability distri- 
butions of the derived quantities. To determine what num- 
ber of iterations suffices for this purpose, we ran the code 
1,000,000 times, using emission line fluxes measured for 



IC1747 by [Wesson et al.| ( |2005| ). We then considered sub- 
sets of the output from this run. 

To quantify the sampling uncertainty, we fitted a Gaus- 
sian to the observed probability distribution of the measured 
[O III] temperatures, for each subset of iterations. We chose 
this quantity as its actual uncertainty distribution should 
closely approximate a Gaussian. The uncertainty on the 
Gaussian fit is thus a measure of how well the output dis- 
tribution was sampled, for a given number of iterations. A 
Gaussian output makes it straightforward to quantify the 



sampling uncertainty but the magnitude of this uncertainty 
is a function only of the number of iterations and not of the 
output distribution, and so the result applies generally. 

In Figure [3] we show how the uncertainty of the fitted // 
and a vary with the total number of iterations. We find that 
the precision of the fit improves indefinitely with increasing 
number of iterations up to the limit of our investigation. In 
our own investigations we carried out 10 000 iterations on 
each line list we investigated; Figure [3] shows that by this 
number of iterations, the sampling uncertainty on the [O III] 
temperature is of the order of IK. We carried out our investi- 
gations on single processors of moderately powerful desktop 
and laptop machines, on which 10 000 iterations typically 
took around 40-60 minutes. We plan to parallelise neat to 
enable larger numbers of iterations to be carried out in a 
conveniently short time. 



3.2 Interstellar extinction 

The first step of any abundance analysis is a correction for 
interstellar extinction. The amount of extinction is deter- 
mined by NEAT from the ratios of hydrogen Balmer lines, 
in an iterative procedure: the extinction is first calculated 
assuming intrinsic Ha, H^^ and H7 line ratios for a temper- 
ature of 10 000 K and a density of 1000 cm"^ c(H/3) is cal- 
culated from the fiux-weighted average values derived from 
ratios of the three lines to their intrinsic values, and the line 
list is de-reddened using this value of c(H/3). Temperatures 
and densities calculated as described below. Then, the in- 
trinsic Balmer line ratios are recalculated at the appropriate 
temperature and density, and c(H/3) is recalculated. 

The user can select the particular extinction law to be 
used. Five extinction laws are currently available: the Galac- 



tic extinction curves of Howarth ( 1983 ), Fitzpatrick & Massa 



( |1990| ) and|CardemetaI.| ( |1989t , the Large Magellanic Cloud 
law of Howarth ( 1983), and the Small Magellanic Cloud law 
of Prevot et al.| ( |1984 ) . Adding further extinction laws would 
be straightforward, should the user wish to do so. 

In section [G] we investigate the effect on derived quan- 
tities of the likely uncertainty in R, the ratio of selective to 
total extinction. 



3.3 Temperatures and densities 

Temperatures and densities are calculated using traditional 
collisionally excited line diagnostics. For the purposes of sub- 
sequent abundance calculations, the nebula is divided into 
three "zones" , of low, medium and high excitation. In each 
zone, temperatures and densities are calculated iteratively 
and weighted according to the reliability of each diagnostic. 
Table [2] shows the diagnostics used and the weighting given 
in each zone. 

The scheme to calculate the densities is iterative, and 
proceeds as follows: 

(i) A temperature of 10,000K is initially assumed, and 
the density is then calculated from the line ratios relevant 
to the zone. 

(ii) The temperature is then calculated from the temper- 
ature diagnostic line ratios, using the derived density. 

(iii) The density is recalculated using the appropriately 
weighted average of the temperature diagnostics 
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Figure 3. The uncertainty on the parameters of gaussian fits to a NEAT uncertainty distribution for T([0 ill], as a function of the 
number of iterations. The figures show the uncertainty on the fitted values of fi (1) and a (r). We use these as a proxy for the Monte 
Carlo sampling uncertainty. 



(iv) The temperature is recalculated using this density. 

This iterative procedure is carried out successively for 
low, medium and high ionization zones, and in each case if 
no diagnostics are available, the temperature and/or density 
will be taken to be those derived for the previous zone. 



3.4 Ionic abundances 

Ionic abundances are calculated from collisionally excited 
lines using the temperature and density appropriate to their 
ionization potential. Where several lines from a given ion are 
present, the ionic abundance adopted is found by averaging 
the abundances from each ion, weighting according to the 
observed intensity of the line. 

Recombination lines are also used to derive ionic abun- 
dances for helium and heavier elements. In deep spectra, 
many more recombination lines may be available than col- 
lisionally excited lines. The code calculates the ionic abun- 
dance from each individual recombination line intensity us- 
ing the atomic data listed in Tablefl] Then, to determine the 
ionic abundance to adopt, it first derives an ionic abundance 
for each individual multiplet from the multiplet's co-added 
intensity, and then averages the abundances derived for each 
multiplet to obtain the ionic abundance used in subsequent 
calculations. 



3.6 Output 

By randomising and analysing the line list many times, it is 
possible to build up an accurate picture of the true distribu- 
tion of statistical uncertainties associated with the chemical 
abundances and empirical diagnostics resulting from the line 
flux uncertainties, neat collates all of the results from each 
iteration, and calculates uncertainties as follows: first of all 
for each output parameter it extracts the values containing 
34.1% of all results above and below the median. The data 
is then binned using a bin size of 0.05 times the difference 
between these two values. From the binned data, the mode 
of the probability distribution is obtained, and the code re- 
ports the final quantity and its uncertainties as this mode, 
and the values such that 68.2% of all results lie within the 
range of uncertainties. 

This approach does not assume any particular distri- 
bution of probabilities, but if the distribution is normal, 
log-normal or exponential-normal, then the uncertainties 
thus returned correspond to one standard deviation as nor- 
mally defined. We find that uncertainty distributions are 
not always characterised and may be bimodal or multimodal 
(see Section El, and we therefore recommend that users di- 
rectly inspect the probability distributions. We provide with 
NEAT a small shell script that produces plots for easy in- 
spection. 



3.5 Total elemental abundances 

Generally, not all ionization stages of an ion that are ac- 
tually present in a nebula will be detected, due to lim- 
ited wavelength coverage and sensitivity. Total elemental 
abundances must be estimated using ionization correction 
schemes, which are derived from photoionization models, or 
similarities in ionization potentials, or a combination of the 
two. 

The code currently includes the ICF scheme of [ Kings'^ 
burgh & Barlow (1994). We plan to incorporate further 



ICFs, and in a forthcoming paper we will compare the mag- 
nitude of the systematic uncertainties arising from the choice 
of ICF with the statistical uncertainties. 



4 THE NATURE OF ACTUAL 
UNCERTAINTIES 

In this section we discuss the nature of the uncertainties 
revealed by the Monte Carlo technique. We find that three 
distinct behaviours emerge for the uncertainties on abun- 
dances, depending on the overall depth of the line list being 
analysed. In some cases, where all lines being analysed are 
well detected, the final uncertainties are close to symmet- 
ric and can be well approximated by Gaussian distributions. 
However, in the nebulae that we have analysed, the final dis- 
tributions are better described by log-normal distributions 
than by Gaussian. 

In a minority of cases, very unusual uncertainty distri- 
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Low ionization zone 



Diagnostic 
[O ii] density 
[S II] density 



[Nil] 
[Sll] 
[On] 
[Oi] 
[Ci] 



temperature 
temperature 
temperature 
temperature 
temperature 



Lines 
A3727/A3729 
A6717/A6731 

A6548 + A6584 

A67lV+A6731 
A4068+A4076 
A7319,20+A7330,31 
A3726 + A3729 
A6363 + A6300 

A5577 
A9850 + A9824 

A8727 



Weight 
1 

1 

5 
1 
1 
1 

1 



Medium ionization zone 



Diagnostic 
[CI III] density 
density 
density 
density 
density 
density 



[Ar IV 
[C III] 
[O III] 
[Ar III 
[S III]' 



[Ne III] density 

[O III] temperature 
[Ar III] temperature 
[Ne III] temperature 
[S III] temperature 
[Ne III] temperature 
[O III] temperature 



Lines 

A5517/A5537 

A4711/A4740 

A1907/A1909 

A52/im/A88/im 

A9/im/A21.8/im 

A18.7/im/A33.8/im 

A15.5/im/A36.0/im 

A4959 + A5007 

A7lW#751 

5192 
A3868+3967 



A90( 



" + 9531 



A38( 



.312 
8+3967 



15.5um 

A4959+5007 

52/j,m 



Weight 
1 
1 
1 





4 
2 
2 
1 





High ionization zone 



Diagnostic 
[Ne iv] density 

[Ar v] temperature 
[Ne v] temperature 



Lines 

A2423 
A2425 

A6435+A7005 
A342'6+i^345 



Weight 
1 

1 

1 



A2975 

Table 2. Diagnostics used in the calculation of physical condi- 
tions. 



butions emerge which cannot be sensibly fitted by analytic 
functions. The origin of such distributions arises in some 
cases from the methodology adopted. For example, if tem- 
perature diagnostic lines are weakly detected, then in some 
fraction of neat iterations the temperature may be unde- 
fined, and thus taken as the default of 10 000 K. In other 
iterations it may have a determined value different from 
10 000 K. The temperature distribution then becomes double 
peaked, and this propagates into CEL abundances, although 
it has little effect on RL abundances. In situations like these, 
one can say that the true uncertainty distribution is broader 
even than neat suggests, and must realistically encompass 
both values of the double peaked distribution. 

At intermediate stages, different behaviour may be ob- 
served. In particular, we find that for temperatures derived 
from collisionally excited lines, the uncertainty distribution 
is sometimes well characterised by an exponential-normal 
distribution; that is, the probability of e^ is normally dis- 
tributed. In every case that we examined, though, the con- 
volution of this distribution with the processes involved in 
calculating abundances resulted in a final abundance uncer- 
tainty distribution that was either normal or log-normal. 

We show a selection of illustrative examples in Figure |4] 
and we plot Gaussian, log-normal and exponential-normal 



fits to the distributions shown where possible. In Table [3] we 
give the parameters of the fits to the plotted distributions. 
The RMS of the residuals is given as a quantitative measure 
of which distribution is a better fit to the data. 

The finding that the majority of uncertainty distribu- 
tions are best described by log-normal distributions demon- 
strates that analytic uncertainty propagation generally does 
not accurately quantify the true uncertainties arising from 
real measurements of the spectra of astronomical objects. 
It also implies that temperatures, densities and abundances 
are generally more likely to be underestimated than over- 
estimated, although this effect will generally be small. The 
emergence of unquantifiable distributions shows that ana- 
lytic techniques can break down very severely; most signifi- 
cantly there is on the face of it no obvious difference between 
the line lists that give generally log-normal final uncertain- 
ties, and those that give unquantifiable distributions. Thus, 
it appears that there is no a priori way of telling whether 
the uncertainties are going to be well behaved or erratic. 



5 THE RP EFFECT 

As discussed earlier, neat accounts for the RP effect, in 
which fluxes measured from lines with SNR<6 are generally 
overestimated, with the magnitude of the effect increasing 
as SNR^l. In this section we show the importance of this 
effect and demonstrate that flawed results will inevitably 
result if the effect is ignored. 

To investigate the magnitude of this effect, we reanal- 
ysed two line lists - spectra of NGC 6543, the Cat's Eye Neb- 
ula, presented by Wesson & Liu (2004), and spectra of the 
Orion Nebula presented by Esteban et al. (2004). In both 



cases we ran two instances of neat, one in which the RP 
effect was ignored, and all line flux uncertainties quoted in 
the two papers were assumed to represent Gaussian proba- 
bility distributions, and the second in which the RP effect 
was accounted for as described in Section 13.1.31 

One important and as yet unresolved issue in nebular 
abundance studies is the so-called abundance discrepancy 
problem (see for example Liu ( 2006 ) for a review) . The RP 
effect can cause errors in the assessment of the magnitude 
of the discrepancy; recombination lines of heavy elements 
are much weaker than the collisionally excited lines of the 
same species, and are thus measured with lower signal to 
noise ratios. In almost any real astronomical spectra, re- 
gardless of the number of lines detected, the weakest lines 
measured will be subject to the RP effect, and for deep spec- 
tra of photoionized nebulae, the weakest lines will almost all 
be recombination lines. Thus, recombination line abundance 
measurements may be subject to an upward bias that colli- 
sionally excited line abundances are largely free from. 

Figure [5] shows that in these two nebulae, ignoring the 
RP effect indeed has no effect on abundances of nitrogen, 
oxygen and neon derived from collisionally excited lines, but 
in all cases leads to an overestimate of the abundances de- 
rived from the recombination lines of these elements. Fur- 
thermore, it turns out that properly accounting for the non- 
Gaussian nature of the uncertainties on weak lines leads to 
a significant reduction in the uncertainty associated with 
abundances determined from them; we fit the resulting un- 
certainty histograms with Gaussian functions, and find that 
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Nebula 



Quantity 



Gaussian 



M 



RMS of residuals 



Log-normal 
a RMS of residuals 



Reference 



Orion He/H 0.096 

BAT99-11 He/H 0.086 

Cn3-1 Ar/H 2.91x10" 

NGC6803 T([0 III]) 9410 



0.004 14.0 


-1.019 


0.041 13.5 


(1) 


0.014 23.8 


-1.073 


0.159 14.7 


(2) 


1.20x10-^ 41.1 


-12.87 


0.410 11.1 


(3) 


Gaussian 




Exponential-normal 




a RMS of residuals 


u 


(J RMS of residuals 





65 



18.4 



12231.8 795.0 15.5 



(3) 



Table 3. Analytic fits to the uncertainty distributions shown in Figure p] Log-normal distributions emerge much more frequently than 
Gaussian or unquantifiable distributions. The line lists analysed are from (1) Esteban et al. (2004), (2) Stock, Barlow &; Wesson (2011), 
(3) Wesson, Liu & Barlow (2005). The exponential-normal parameters for NGC 6803 are those resulting from fitting the uncertainty 
distribution with a function in which exp(Te/1000) was normally distributed. 



accounting for the RP effect reduces the relative standard 
deviation on the measured abundances by 25-30%. Table [4] 
summarises the mean and standard deviation of the final 
abundance determinations from recombination lines. 

In other shallower spectra, it may often be the case 
that the auroral Unes [N ii] A5754 and [O ill] A4363 are 
measured with low enough SNR that they become subject to 
the RP effect. In this case, the derived temperatures would 
be overestimated, and collisionally excited line abundances 
underestimated. To investigate this effect, we analysed the 
line list of H1013, an extragalactic H II region in the spiral 
galaxy MIDI, presented by Esteban et al. (2009). In this 



nebula, the [O ill] line at 4363A is detected with an SNR of 
only 2.7. 

Figure |6] shows that significant systematic uncertainties 
are produces when the RP effect on temperature diagnostic 
lines is ignored. When the effect is neglected, we determine 
an [O III] temperature of 7480±610K, in very close agree- 
ment with the value of 7370±630 K reported by |Esteban| 
et al. (2009). However, when we account for the RP effect. 



we find a value of 6840±390K. Similarly for abundances, 
considering 0^^/H^, we find that by neglecting the RP ef- 
fect we obtain a value of 7.95=b0.15 (on a logarithmic scale 
wh ere N(H) = 12) close to the value of 8.05=b0.12 obtained 
by Esteban et al.| ( |2009 ) . Accounting for the effect yields a 
valueof 8.16±0.12. 

The reduction in uncertainty when accounting for the 
RP effect arises in two distinct ways; firstly, as can be seen 
in Figure p] the probability distribution for the [O ill] tem- 
perature when the RP effect is not accounted for is neither 
normal nor log-normal but is instead better described by 
an exponential-normal distribution. When the input uncer- 
tainty distribution is correctly characterised as a log-normal 
distribution, the convolution of this distribution with the 
processes which give rise to the exponential-normal distri- 
bution of temperature probabilities results in a final distri- 
bution which is narrower than when the input distribution 
is assumed to be normal. 

The second effect is that when abundances from many 
weak lines are being combined to derive an abundance, ac- 
counting for the RP effect results in a modest reduction in 
the scatter of the derived abundances, and a corresponding 
reduction in the overall uncertainty on the combined abun- 
dance. 

We re-emphasise, therefore, that neglecting this effect 
results in incorrect abundances. Accounting for it removes a 



source of systematic uncertainty, and reduces the statistical 
uncertainty of the abundances determined. 

One can see in Figure [5] that abundances derived from 
recombination lines have similar or smaller uncertainty dis- 
tributions than those derived from collisionally excited lines, 
even though the line fluxes may be several orders of mag- 
nitude lower. This reflects their very weak dependence on 
temperature and density; the uncertainties on the adopted 
temperatures and densities hardly propagate into RL abun- 
dances but have a significant effect on the GEL abundances. 



6 INTERSTELLAR EXTINCTION 

NEAT allows a robust propagation of uncertainties from line 
flux measurements into derived quantities. It is also possible 
to investigate the effect of statistical uncertainties arising at 
different stages of the process. In this section, we consider 
the effect of the uncertainty in R, the ratio of total to selec- 
tive extinction given by 



R = 



A{V) 
E{B - V) 



(15) 



It is well known that R varies along different sight lines 



(eg Valencic et al. (2004), Larson fc Whittet (2005)), but 



determining its value for particular objects is generally im- 
practical and instead, it is commonly assumed to equal 3.1. 
We investigate the effect of an uncertainty in this value by 
comparing analyses in which R is fixed to be 3.1, and in 
which R is drawn from a Gaussian distribution with /i=3.1 
and cr=0.15. For this investigation we used the R-dependent 
extinction law parametrization of |Cardelli et al.] ( |1989| . We 
took the uncertainty as the largest value that we found 



quoted in the literature for the diffuse ISM ( Larson & Whit- 
tetl|2Q05t . 

We analysed the emission line measurements of NGC 
7026 presented by Wesson et al. (2005), including the line 
flux uncertainties which were not published in that pa- 
per. We chose this object as it is significantly reddened 
(c(H/3)=1.0), and has many very well detected lines in its 
spectrum (142 lines measured, 120 with SNR>3 and 65 with 
SNR>10). This combination should maximise the effect of 
an uncertainty on R, relative to the effect of the line flux 
measurement uncertainties. 

We find that including the effect of an uncertainty in 
R has a noticeable effect on the probability distribution of 
dereddened line fluxes. However, in the conversion from line 
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Figure 4. Some representative examples of probability distributions emerging from this analysis. These results are from analyses of 
the Orion Nebula (top left), WR nebula BAT99-11 (top right), Cn3-1 (middle left), NGC 6803 (middle right), Sp 4-1 (bottom left) and 
DdDm 1 (bottom right). 



fluxes into physical quantities, the statistical uncertainties 
arising from line flux measurements completely dominate, 
and the probability distributions are statistically identical 
whether R is assumed to be fixed or allowed to vary. This 
result is shown in Figure [7| where we plot the probability 
distributions for the [Ne ill] 3868A dereddened line flux, and 
the abundance derived from it. The effect on the derived 
abundance of the assumed uncertainty on R is negligible. 



7 DISCUSSION 

Motivated by the necessity of properly understanding 
sources of uncertainties in analyses of photoionized nebu- 
lae, we have presented a new code for calculating chemi- 
cal abundances in photoionized nebulae, which also robustly 
calculates the statistical uncertainties on the abundances de- 
termined. The code is freely available and we welcome bug 
reports and feature requests. 

Analytic methods of uncertainty propagation rely on 
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Element lO'^ x X/H (RP ignored) 10^ x X/H (RP included) 



cr / iJi{RP included) 
a / fi{RPignored) 



N 


2.74 ± 0.16 


O 


7.04 ± 0.63 


Ne 


1.52 ± 0.26 


N 


6.93 ± 0.56 


O 


14.94 ± 1.15 


Ne 


4.17 ± 0.94 



Orion 
2.37 ± 0.11 
6.63 ± 0.43 
1.20 ± 0.17 
NGC 6543 
5.83 ± 0.37 
12.48 ± 0.74 
3.23 ± 0.56 



0.79 
0.73 
0.83 

0.78 
0.77 
0.77 



Table 4. Results of heavy element abundance determinations from recombination lines, using the published line lists of [Wesson &: Liu| 
( |2004| > and |Esteban et al.| ( [2004| for the Orion Nebula. 



assumptions that typically do not hold for real astronom- 
ical data, and therefore we have adopted a Monte Carlo 
approach. We have shown that the analytic approach does 
not give a good estimate of the true statistical uncertainties. 

We have provided in the code a means for accounting for 
the well known upward bias in flux measurements of weak 
lines, and we show that doing so results in reduced statistical 
as well as systematic uncertainties in abundance determina- 
tions. This effect should not be ignored in any analyses of 
emission line spectra; in almost any data set, the weakest 
lines will be subject to the effect, and unless these lines are 
ignored in the analysis, then incorrect results will be ob- 
tained if the RP effect is not properly accounted for. If, as 
is commonly the case, recombination lines make up the ma- 
jority of the weak lines, then systematic misunderstandings 
may arise if the effect is neglected. In the examples that we 
have analysed, however, this effect is too small to account for 
the magnitude of the discrepancy typically found between 
RL and CEL abundance determinations. 

Finally we have shown that possible uncertainties in the 
value of R have a negligibly small effect on the uncertainties 
on the final quantities. 

In the present analysis we have considered only sta- 
tistical uncertainties. The correct propagation of these, as 
we have seen, can reduce their final magnitude. However, 
results obtained using neat are of course subject to system- 
atic uncertainties as well. These have many potential ori- 
gins: systematic uncertainties in atomic data and the choice 
of atomic data to use for each ion; the choice of interstellar 
extinction law; the choice of ionization correction scheme; 
the methodology of calculating the temperatures and den- 
sities to use for the abundance calculations; unjustified as- 
sumptions in the analysis such as assumptions of constant 
temperatures and densities; correction for underlying stellar 
absorption in certain types of nebula; and others that may 
yet be unknown. As [Kwitter fc Henry| ( |2QTT] ) point out, in- 
vestigators starting from an identical line list may arrive at 
quite different results, depending on what choices they make 
regarding the sources of these systematic uncertainties. 

Two questions which arise regarding these uncertainties 
are 

(i) Which systematic choices are the most important? 
(ii) What is the statistical significance of the systematic 
uncertainties? 

Through a proper understanding of statistical uncer- 
tainties, it may be possible to answer these questions. For 



example, by comparing the systematic uncertainty intro- 
duced by varying the choice of reddening law with the sta- 
tistical uncertainty arising from line flux measurement, one 
could determine quantitatively whether or not the choice of 
reddening law is crucial or relatively inconsequential. In a 
forthcoming paper we plan to extend this type of analysis 
to quantify the effects of systematic choices in terms of sta- 
tistical uncertainties, and thus to be able to determine the 
relative importance of each of the systematic choices being 
made. 
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Figure 5. The RP effect on abundance determinations. Dashed hnes show abundances derived assuming that weak hue fluxes have 
a gaussian uncertainty distribution. Sohd hnes show the results obtained by more reahsticaUy assuming a log-normal distribution as 
described in Sectionjs] Thin red and thick blue lines correspond to collisionally excited line and recombination line abundances respectively. 
Only the far weaker RLs are subject to the RP effect. 
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Figure 6. The RP effect when CEL temperature diagnostic hnes are measured with low SNR. The figure shows a reanalysis of data for 
the H II region H1013, showing the RP effect on the derived [O ill] temperature (1) and 0^+/H+ abundance (r). 
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Figure 7. The effect of an assumed uncertainty in R, the ratio of selective to total extinction, on the dereddened line fiux of the [Ne ill] 
line at 3868 A (1) and derived Ne2+/H+ abundance (r). 
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